How to analyse evolutionary algorithms
نویسندگان
چکیده
Many variants of evolutionary algorithms have been designed and applied. The experimental knowledge is immense. The rigorous analysis of evolutionary algorithms is di/cult, but such a theory can help to understand, design, and teach evolutionary algorithms. In this survey, 0rst the history of attempts to analyse evolutionary algorithms is described and then new methods for continuous as well as discrete search spaces are presented and discussed. c © 2002 Elsevier Science B.V. All rights reserved. 1. Some history of evolutionary algorithms Evolutionary algorithms (EA) form a class of probabilistic optimization methods that are inspired by some presumed principles of organic evolution. Whether such inspiration is helpful or hampering, a neutral side aspect, or an opportunity to build bridges between the islands of di5erent disciplines forming the cluster of human knowledge, may be debated controversially, but not in this contribution. It is simply a matter of fact that EA have become a welcomed tool for tackling the search for extrema, e.g. optimal parameters within simulation models [79], that withstand classical approaches. Subsequently mentioning only three spatially di5erent though nearly contemporaneous sources (earliest traces go all back to the early 1960s, instead we cite some later but better-known ones) ∗ Corresponding author. E-mail addresses: [email protected] (H.-G. Beyer), [email protected] (H.-P. Schwefel), [email protected] (I. Wegener). 1 This author is supported as Heisenberg fellow of the DFG under grant Be 1578=4-2. 2 These authors were supported by the Deutsche Forschungsgesellschaft (DFG) as part of the Collaborative Research Center “Computational Intelligence” (SFB 531). 0304-3975/02/$ see front matter c © 2002 Elsevier Science B.V. All rights reserved. PII: S0304 -3975(02)00137 -8 102 H.-G. Beyer et al. / Theoretical Computer Science 287 (2002) 101–130 • evolutionary programming (EP) [35] • genetic algorithms (GA) [46] • evolution strategies (ES) [70,78] does not mean that there were not more inventors of the same or at least similar ideas. Fogel [33] has made an attempt to collect a fossil record of the early birds in the 0eld. This 0eld called evolutionary computation (EC) since members of the three teams mentioned above met at conferences like Parallel Problem Solving from Nature (PPSN) [82], International Conference on Genetic Algorithms (ICGA) [8], and Evolutionary Programming (EP) [34], has got an accommodation in computer science under the roof of computational intelligence (CI) or soft computing or bio-inspired or natural computation together with two other 0elds, i.e. neural and fuzzy computation. A series of three handbooks [7,30,76] as well as concurrent conferences every four years since 1994 under the umbrella “World Congress on Computational Intelligence” [55,36] may serve as witnesses of the broad interest this set of methods has gained, recently. The general frame of EP, GA, and ES is essentially the same and very simply summarized by a loop over partially randomized variation and selection operators steering exploration and exploitation (or chance and necessity) and, in contrast to traditional optimization procedures, acting upon a set of search points in the decision variable space. That is why some of the theoretical investigations mentioned later lead to results that are valid for nearly all simple EA. Nevertheless, due to the di5erent origins, some features of the “canonical” versions of the algorithms are quite speci0c, and some people still speak of schools or demes that have emphasized or still emphasize their beloved Gourish. Therefore, a few remarks seem appropriate about the three kindergartens. To do this we use the popular nomenclature (see [18]). It should be intuitive enough so that we do not need sophisticated de0nitions here for an individual (set of variables), its 0tness (objective function value), or a generation (one iteration loop with parents and their o5spring), etc. Evolutionary programming (EP) was 0rst devised to let 0nite state machines become more and more “intelligent by means of simulated evolution”. One or more out of a couple of distinct small manipulations of the state diagram of a parent machine, i.e. a (uniformly distributed random) mutation, o5ers an o5spring. Usually, each parent creates one child. No recombination is applied. Selection takes place as a series of tournaments (the pendant of the proverbial “struggle for life”) each with a subset of the contemporary competitors. Those individuals earning highest scores, exactly 50%, enter the next generation. Later, Fogel [32] revised his father’s original EP in di5erent ways, some of which resemble more or less the evolution strategies as used in the case of real-valued parameter optimization. Not making use of recombination has remained a “philosophical” distinction to all other EA (see [31]). We do not discuss this further than mentioning that the evolving entities are thought of as species instead of individuals—and by de0nition, species do not exchange genetic material=information. Genetic algorithms (GA) initially served as simpli0ed models of organic evolution in order to investigate adaptation capabilities that might be useful examples for other disciplines, as well. Despite that older members of this school still today emphasize that GA are no optimization methods, it is just that domain where they have earned appreciation including money. The evolving entities are genomes carrying the phenotypic H.-G. Beyer et al. / Theoretical Computer Science 287 (2002) 101–130 103 characteristics in coded form, usually making use of an alphabet with low cardinality, on a digital computer consequently in binary form. The initial population is typically generated by drawing all bits with same probability for zeros and ones (or pure random setting within non-binary 0nite search regions). The main variation operator is recombination, more precisely crossover, e.g. two-point crossover. In this case, the bitstrings of two parents are cut at two random positions and put together by exchanging the innermost parts between the parents, thus creating two o5spring at a time. Discussions whether it is better to use both or only one of them, are still ongoing. Not all reproductions underlie recombination (canonically 30% not), so that some individuals are either clones or survivors from the last generation. Mutation, i.e. Gipping a bit at this or that position, has been introduced with low probability (e.g. 0.1%) to prevent that a small population loses a still needed one or zero prematurely. In many applications, higher mutation as well as crossover probabilities have become popular, e.g. 1=n as mutation probability in case of a genome with n bits and one as crossover probability. Selection takes place when the partners are drawn for recombination. Those who own higher 0tness values (in case of minimization of course those with lower objective function values) are preferred. This may be done by ranking the individuals, or canonically, by giving them a chance that is proportional to their (always positive, if necessary transformed) 0tness. Evolution strategies (ES) were devised as experimental optimization techniques, e.g. to drive a Gexible device step by step into its optimal state. The 0rst experiments were performed with just one ancestor and one descendant per generation and mutations created by subtracting two numbers drawn from a binomial distribution. The ancestor was replaced by its o5spring if the latter was not worse than the former. As soon as computers became available, this two membered or (1+1)-ES was accompanied by the multimembered version with recombination. Now, parents create o5spring within one reproduction cycle. Two or even more parents are involved in the recombination step, two extreme forms of which are called discrete (or dominant) and intermediate, respectively. In the case of intermediate recombination, the average of the parental variable values is transferred to the o5spring, whereas discrete recombination (like uniform crossover in GA) chooses each component from one of the parents at random. No check is imposed that the parents involved are all di5erent, and there is no mating selection, all parents have the same chance to be chosen. Additionally to 100% recombination, 100% mutation takes place with maximum entropy probability distributions (geometrical for integer variables) or probability densities (normally distributed in case of continuous variables). If the parents for the next generation are drawn from the o5spring only—this scheme is called ( ; )-ES—there must be a birth surplus, obviously. Otherwise, all parents take place, too, in the ( + )-ES, the extreme form of which with =1 is called “steady state”, as has been done with the corresponding GA version. Selection is performed in a strictly deterministic manner and has been called truncation selection, because except for the best individuals all others are discarded=forgotten. The so far best individual may be stored outside the population, of course. Both comma and plus selection schemes are the extremes of a more general ( ; ; ; )-ES with as upper limit of the number of reproduction cycles an individual is 104 H.-G. Beyer et al. / Theoretical Computer Science 287 (2002) 101–130 staying in the population and as the number of parents involved in the recombination step for each o5spring. The special notation of a ( = ; )-ES stands for a comma version with so-called multirecombination, i.e. inheriting to each descendant parameter values that represent the average over parents—the ultimate case being = in one direction and =1 (no recombination) in the other. Other variants of these three early variants are now collected under the notion of recombinant evolutionary algorithms (EA). Hundreds if not thousands of other incarnations have been proposed and applied. A data base of US patents revealed 67 procedures that bear the name GA in their headline—despite un0nished discussions about when an EA is no longer a GA. For quite a while binary encoding of the decision variables seemed to be a necessary ingredient—until real-coded GA entered the literature, (see, e.g., [29]), even with deterministic truncation selection [60]. Due to the fact that probably more than 2000 articles are published annually since a couple of years (see [1]), it is more likely than not that some features of the strategies are reinvented, probably under di5erent names, and same names do not guarantee same features, respectively. Some recently introduced crossover operators produce variations that are traditionally expected under the name mutation. Until recently, the number of rigorously proven facts about the behavior of EA has been rather small. Nevertheless, there have been some strong beliefs upon which decisions about choosing one or the other version have been taken. Some of them turned out to be wrong, others are still unproven hypotheses or summaries of empirical experience. Repeating arguments and counter-arguments from 0nished or still ongoing discussions would 0ll too many pages and turn out as boring for the uninitiated. That is why we restrict our report to only some, maybe called central, discussions of the past and then turn to the presence, especially to most recent hard facts. First analyses of the ES performance concentrated on the so-called progress velocity, i.e. the average distance in the search space traveled in the useful direction per function evaluation. This local measure was considered for the two-membered ES with uniform random discrete mutations in the Moore neighborhood of the parent on an inclined plane, a parabolic ridge, and a parabolic top with circular level lines. The useful direction in case of the inclined plane was the gradient direction, in case of the ridge the straight line connecting the vertices of the parabolic level lines, and in case of the top any reduction of the distance to the summit was considered as useful. Schwefel [77] observed that such discrete mutations can lead to stagnation of the search somewhere on the ridge and to a considerable decrease of the progress velocity when approaching the hilltop. He proposed to use more versatile variation schemes with smaller as well as larger mutations, e.g., according to a Gaussian probability density distribution with zero mean and given standard deviation for each (continuous) variable. For such continuous mutations Rechenberg [70] found asymptotic approximations of the progress velocity of a two-membered ES on two model functions, a spherical model as the parabolic top above and a corridor model, which resembles an n-dimensional rectangular ridge. In both cases the progress rate (expected distance traveled per objective function call) only depends on the number of variables, the standard deviation of the mutations (same for all directions), and a topology parameter, i.e., the distance from the optimum in case of the hypersphere or the corridor width (same for n− 1 perpendicular directions H.-G. Beyer et al. / Theoretical Computer Science 287 (2002) 101–130 105 in the n-dimensional space) in case of the rectangular ridge. Dividing the progress velocity and the standard deviation by the topology parameter and multiplying both items with the number of variables, the formulas become simple relations between the normalized progress rate and the normalized “mean step size” or square root of the single mutation variance. This relation has a maximum that in both cases corresponds to a success probability (the probability of replacing the parent by the o5spring) in the vicinity of 20%. If the standard deviation is smaller than at this maximum, then the success probability is higher, but the search is slower; if, however, the mean step size is larger than optimal, both the progress rate and the success probability decline until they vanish at in0nitely large mutations. At least 50% of the maximal progress rate can be achieved within an “evolution window”, a range of about one decade concerning values of the standard deviation. The monotonicity of the success probability over the mutation strength has led to a simple rule for adjusting the latter (1=5 success rule). This investigation was extended by Schwefel [78,80] for multimembered ES with descendants per generation and just one parent, thus necessarily without recombination. Both the comma and the plus versions were considered. The asymptotic approximations of the “universal” laws for normalized progress velocity over normalized standard deviation are of same type as above for all plus versions including =1, but they di5er substantially in case of the comma ES when the standard deviations exceed their optimal values by far. Negative progress rates indicate divergence of the optimum-seeking process when the mutation steps become too large. The maxima of the progress-rate curves increase sublinearly with the number of descendants per generation and di5er vanishingly between plus and comma strategies. First empirical results about a positive inGuence of recombination on the expected progress velocity of a ( + 1)-ES were obtained by Rechenberg [70] already. Thus it is wondrous that more often than not people argue recombination to be a secondary variation operator in ES (in contrast to GA, where mutations really were thought to be of secondary importance for a long time). Self-adaptation of the mutation strength(s) has been considered as of utmost importance from the very beginning of the ES history. Such a feature is an ingredient of all classical optimization procedures. Whereas step size control in that domain relies on a more or less sophisticated internal model of the (local) response surface (0tness landscape, otherwise) and a rational processing of the information usually gathered over a series of iterations, a self-adaptive ES would have to consider the objective function as a black box and to operate on less knowledge about its historical pathway (in case of mostly haploid individuals with just one set of genes). Early empirical investigations [78,80] led to the belief that under certain conditions such self-adaptation without exogenous control can be achieved, but not under the ( + 1)or steady state scheme, because decreasing the mutation strength is always rewarded via an increased success rate. The so-called mutative step-size control operates with individuals that are not only characterized by their vector of object variables, but additionally by one standard deviation used for creating the o5spring or even more strategy parameters controlling mutations with more general normal probability density distributions. A birth surplus seems indispensable in order to give the optimal mutation 106 H.-G. Beyer et al. / Theoretical Computer Science 287 (2002) 101–130 step size a chance to succeed within just one generation. This led to proposing ES with ¿1, more generally with as many descendants as are necessary to allow at least one descendant per parent that improves the objective function. Calling the ratio = birth surplus or selection pressure, this ratio would have to be equal to or higher than the inverse success probability corresponding to the optimal mutation strength with maximal progress velocity. Even up to n di5erent step sizes for the n variables could be envisaged under such premise—if was not too small [81]. Dreams of incorporating even more degrees of freedom of the normal distribution by introducing the full correlation matrix with up to n(n− 1)=2 non-zero correlation coe/cients could not be realized at that time to full extent due to a lack of computation power. Rudolph [72] conjectured that Q(n2) individuals in an ES population might be necessary in order to adapt so many strategic parameters representing the “internal models” of the individuals’ environment. Despite of enduring controversial discussions, Holland’s schema theorem [46] is still a corner stone of the GA theory. A schema is a bitstring with one or more do not care symbols “∗ ” and thus represents 2d di5erent bitstrings with d as number of the “∗ ”. Holland expressed the expected number of o5spring representing some schema after applying proportional selection, one-point crossover, and mutation in terms of an inequality with the number of parents belonging to the same schema on the right hand multiplied by three factors. The 0rst factor is the average 0tness of the parental schema divided by the average of the whole population; this factor is thus greater than one for above average parents (on the premise of diversity among the parents). Both other factors are less than one and represent probabilities of harmful recombinations and harmful mutations. The 0rst factor has been rewritten as 1+c, and by assuming c to be a constant over several generations this has led to the belief of an exponential increase of the number of above average 0t parental schemata. But c must vanish in approaching an optimum, and the inGuence of the other factors, being detrimental, 0nally dominates if the mutation and recombination probabilities do not vanish. Rudolph [74] found that a canonical (non-elitist) GA 0nally Guctuates at a certain distance of the optimum, because the best positions get lost again and again. This corresponds, by the way, to the continuous Fisher–Eigen model and its 0ndings (see [54]). Neglecting improvements by mutation and recombination, the schema theorem does not help in modeling the progress velocity in terms of the so far best solution within a 0nite population. Another strong belief concerning GA is the so-called building block hypothesis (BBH, see [38]). It states that recombination, e.g. one-point crossover, often enables to put together good parts of one parental bitstring with good other parts of the second parent delivering an even better combination of both in an o5spring. Such argument resembles in some way the situation in continuous search spaces where improving steps in several independent directions can be superimposed with overall positive e5ect. But, this happens only if the objective function is decomposable in some way and the corresponding n independent directions can be found. Generally, such decomposable objective functions are rarely given, and if so, n one-dimensional line searches su/ce for 0nding the optimum. For a more detailed discussion see [40] and [74]. Finally, we can ask whether we really need EA, whether EA need features of organic evolution, or not. The second question may be answered by the infamous “yes H.-G. Beyer et al. / Theoretical Computer Science 287 (2002) 101–130 107 and no”:—No, because any idea improving an algorithm to solve a given problem is feasible, may it resemble biological prototypes or not. The best way to handle a given problem would be the invention of a special method, even a best one if it exists. Its goodness depends merely on our knowledge or ignorance of the problem’s characteristics. —Yes, because otherwise the name of the method should be changed—or it becomes deceptive. At least some researchers (like Holland) insist that EA are an instrument to learn about natural processes. The 0rst, even broader question presumably does not lead to an answer which could be agreed upon by all people. Again, one might call for special methods for special problems. But, not willing to spend enough time to invent such special methods, practitioners are cast toward using existing methods even if they are not optimal. In the following two sections we present new methods how to analyze evolutionary algorithms on continuous (Section 2) and discrete (Section 3) search spaces. 2. Methods for continuous search spaces and general convergence aspects It is common belief that evolutionary optimization of real-valued objective functions in Rn search spaces is a specialty of evolution strategies (ES). While there are indeed state-of-the-art ES versions specially tailored for Rn supporting this belief, it is historically not correct (for the history see [17]). The appearance of special ES versions for search in Rn may be regarded as a consequence of the theory: theoretical investigations on the behavior of EA in Rn search spaces have been done mainly in the 0eld of ES. As to the other EA, there are only a few exceptions. Concerning real-coded GA, the work of Qi and Palmieri [52] should be mentioned here, where the e5ect of adaptive (real-valued) mutations on the convergence properties in a GA using 0tnessproportional selection has been investigated. Only recently Beyer and Deb [16] started 0rst investigations on the (self-) adaptive behavior of real-coded GA populations and pointed out similarities concerning the convergence order of real-coded GA and ES. In the early phase of ES, these EA were mainly developed and analyzed by engineers. A more or less system-theoretic approach aiming at the prediction of the EA’s behavior as a dynamical system served as the central paradigm. That is, the usual way of thinking about a theory of EA is considering the EA and the objective function f :Rn → R (function to be optimized, often referred to as 0tness function) in terms of a dynamical (or evolutionary) system, the “EA system”. The goal of this type of theory is therefore to model the real EA system and to predict certain aspects of its behavior. Evolution strategies as a special version of EA operate on a population of parent individuals P=(a1; : : : ; a ). In general, each individual am comprises a set of object parameters y∈Rn (i.e., the search space variables to be optimized), a secondary set of so-called (endogenous) strategy parameters s, and its 0tness function value f(y): am =(ym; sm; f(ym)). By producing o5spring ãl from the parental population P via recombination and mutation an o5spring population P̃ is formed. After that, truncation selection (sometimes called “breeding selection”) is applied resulting in a new population forming the parent population at time step (or generation) t + 1. Depending on whether selection takes only P̃ into account or both parent and o5spring population 108 H.-G. Beyer et al. / Theoretical Computer Science 287 (2002) 101–130 (P; P̃), one speaks of comma selection (denoted by ( ; )) and plus selection (denoted by ( + )), respectively. The latter case is an elitist selection scheme because it conserves the best individual (with respect to its measured 0tness) found so far. From a formal point of view, the state of the EA at time t is fully determined by the state of the parent population P(t). If we include all information which inGuences the future in the strategy parameters, the stochastic process describing the EA is a memory-less process (or 0rst-order Markov process) whose transition operator will be called M(t). Let p(t)(P) be the state density at time step t. Then
منابع مشابه
Estimation of LPC coefficients using Evolutionary Algorithms
The vast use of Linear Prediction Coefficients (LPC) in speech processing systems has intensified the importance of their accurate computation. This paper is concerned with computing LPC coefficients using evolutionary algorithms: Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Dif-ferential Evolution (DE) and Particle Swarm Optimization with Differentially perturbed Velocity (PSO-DV...
متن کاملNovel Hybrid Fuzzy-Evolutionary Algorithms for Optimization of a Fuzzy Expert System Applied to Dust Phenomenon Forecasting Problem
Nowadays, dust phenomenon is one of the important challenges in warm and dry areas. Forecasting the phenomenon before its occurrence helps to take precautionary steps to prevent its consequences. Fuzzy expert systems capabilities have been taken into account to assist and cope with the uncertainty associated to complex environments such as dust forecasting problem. This paper presents novel hyb...
متن کاملMulti-layer Clustering Topology Design in Densely Deployed Wireless Sensor Network using Evolutionary Algorithms
Due to the resource constraint and dynamic parameters, reducing energy consumption became the most important issues of wireless sensor networks topology design. All proposed hierarchy methods cluster a WSN in different cluster layers in one step of evolutionary algorithm usage with complicated parameters which may lead to reducing efficiency and performance. In fact, in WSNs topology, increasin...
متن کاملOPTIMAL CONSTRAINED DESIGN OF STEEL STRUCTURES BY DIFFERENTIAL EVOLUTIONARY ALGORITHMS
Structural optimization, when approached by conventional (gradient based) minimization algorithms presents several difficulties, mainly related to computational aspects for the huge number of nonlinear analyses required, that regard both Objective Functions (OFs) and Constraints. Moreover, from the early '80s to today's, Evolutionary Algorithms have been successfully developed and applied as a ...
متن کاملNovel Hybrid Fuzzy-Evolutionary Algorithms for Optimization of a Fuzzy Expert System Applied to Dust Phenomenon Forecasting Problem
Nowadays, dust phenomenon is one of the important challenges in warm and dry areas. Forecasting the phenomenon before its occurrence helps to take precautionary steps to prevent its consequences. Fuzzy expert systems capabilities have been taken into account to assist and cope with the uncertainty associated to complex environments such as dust forecasting problem. This paper presents novel hyb...
متن کاملApproximate Pareto Optimal Solutions of Multi objective Optimal Control Problems by Evolutionary Algorithms
In this paper an approach based on evolutionary algorithms to find Pareto optimal pair of state and control for multi-objective optimal control problems (MOOCP)'s is introduced. In this approach, first a discretized form of the time-control space is considered and then, a piecewise linear control and a piecewise linear trajectory are obtained from the discretized time-control space using ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Theor. Comput. Sci.
دوره 287 شماره
صفحات -
تاریخ انتشار 2002